Why DeepSeek had to be open source

517 points by AnhTho_FR a day ago

tim333 a day ago

The article says it had to be open source because otherwise people wouldn't trust the Chinese but ByteDance, Tencent, Baidu, and Alibaba also do LLMs and are not open source.

It's funny reading an article interviewing the ceo:

>Until now, among the seven major Chinese large-model startups, it’s the only one... that hasn’t fully considered commercialization, firmly choosing the open-source route without even raising capital.

>While these choices often leave it in obscurity, DeepSeek frequently gains organic user promotion within the community.

The obscurity thing hasn't lasted! (article nov 2024 https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...)

The ceo's actual argument for open source is quite interesting, basically that it helps attract the best people and the value is in the team. It's kind of what used to work for OpenAI before it became the ClosedAI division of Microsoft.

rcdwealth 5 hours ago

[dead]

lacoolj a day ago

> A Chinese AI API would likely receive skepticism in the West

"Would likely.."? No, it definitely does, and should, for historically good reason. Anyone using this should be doing so with enough grains of salt to fill SLC

https://www.euronews.com/next/2025/01/28/chinese-ai-deepseek... https://www.theguardian.com/technology/2025/jan/28/we-tried-...

skyyler a day ago

If asking Deepseek about Chinese war crimes is shocking for you, just wait until you ask ChatGPT about Israeli war crimes.
- Argonaut998 13 hours ago
  
  Use Chinese AI for topics the West would censor. Use Western AI for topics China would censor.
  Win/win
  - red-iron-pine 7 hours ago
    
    nah. both sides actively propagandize about each other. they'll emphasize the negative as much as possible and omit any mitigations.
    objectivity is hard, but assuming the other side will give an objective answer is foolish.
    
    RestlessMind 4 hours ago
    
    > nah. both sides actively propagandize about each other.
    Not really. The insinuation that ChatGPT won't talk about Israeli war crimes is nonsense. I just shared my interaction in a sibling comment thread.
    American system, with all its warts, is still much better than the Chinese one. We still have freedom to call out Trump's faults and challenge him in courts or press if he crosses any red lines. Good luck suing XI Jinping for his policies or talking in Chinese press about Uighurs' treatment.
    
    skyyler 3 hours ago
    
    Copilot doesn't like to talk about it as much as ChatGPT, same prompt as you:
    (what are the war crimes committed by Israel in the latest Gaza war)
    The topic of war crimes is very sensitive and complex. It's important to approach it with care and consideration for all perspectives involved. War crimes are serious violations of international humanitarian law, and accusations of such crimes are typically investigated by international bodies like the United Nations or the International Criminal Court.
    For the latest Gaza conflict, there have been various reports and allegations from different sources. These include accusations of targeting civilian infrastructure, disproportionate use of force, and the use of prohibited weapons. However, it's crucial to note that these are allegations and need thorough investigation and legal processes to determine their validity.
    If you're looking for detailed and verified information, I recommend checking reports from reputable international organizations such as the United Nations, Amnesty International, or Human Rights Watch. They often provide comprehensive analyses and updates on such matters.
    If you have any other questions or need further assistance, feel free to ask.
- m463 a day ago
  
  can't you do the "one guard lies, one guard tells the truth?" thing?
  "What does <other-model> say about <x> war crimes?"
- RestlessMind 4 hours ago
  
  I just did and ChatGPT gave me a good response.
  me: "what are the war crimes committed by Israel in the latest Gaza war"
  ChatGPT: During the recent Gaza conflict that began in October 2023, several international organizations and human rights groups have accused Israeli authorities of committing war crimes and crimes against humanity. The primary allegations include:
  1. Indiscriminate Attacks on Civilians and Civilian Infrastructure:
  Airstrikes on Residential Areas: Israeli forces conducted extensive airstrikes in densely populated civilian areas, leading to significant civilian casualties and destruction of homes. Human Rights Watch reported that these attacks resulted in mass displacement and alleged violations of international humanitarian law. HRW.ORG Targeting of Shelters and Medical Facilities: There have been documented instances where Israeli airstrikes hit UN-run schools and hospitals sheltering displaced civilians. For example, an airstrike on a UNRWA school in the Al-Maghazi refugee camp resulted in multiple fatalities and injuries. Such actions have been condemned as showing a flagrant disregard for civilian lives. HRW.ORG 2. Forced Displacement and Blockades:
  Mass Evacuation Orders: Israeli authorities issued directives for large-scale evacuations of Palestinian civilians in Gaza, leading to accusations of forced displacement without adequate provisions for safety and shelter. Human Rights Watch has labeled these actions as war crimes and crimes against humanity. HRW.ORG Blockade and Restriction of Humanitarian Aid: The imposition of blockades restricting the flow of essential humanitarian aid, including food, water, and medical supplies, has been viewed as collective punishment. Amnesty International concluded that such actions amount to genocide against Palestinians in Gaza. AMNESTY.ORG 3. Use of Starvation as a Method of Warfare:
  Deliberate Deprivation of Resources: Reports indicate that Israeli policies intentionally created life-threatening conditions by denying Gazans sufficient access to water and other essentials. Human Rights Watch stated that these actions fulfill all elements of the crime against humanity of extermination. HRW.ORG 4. Attacks on Individuals Displaying White Flags:
  Killing of Surrendering Individuals: There have been multiple reports of Israeli forces shooting unarmed Palestinians who were attempting to surrender or were displaying white flags, actions that constitute war crimes under international law. HRW.ORG These allegations have led to international scrutiny and legal actions. The International Criminal Court (ICC) has issued arrest warrants for Israeli Prime Minister Benjamin Netanyahu and former Defense Minister Yoav Gallant, accusing them of war crimes and crimes against humanity related to the Gaza conflict. The ICC alleges that Israeli leadership employed "starvation as a method of warfare" by restricting humanitarian aid and targeting civilians. APNEWS.COM
  It's important to note that these allegations are subject to ongoing investigations, and the accused parties have denied wrongdoing, asserting that their military actions were aimed at neutralizing threats posed by militant groups operating within Gaza.
__MatrixMan__ a day ago

I assume somebody is hosting it domestically? Although you should really take any API hosted by somebody you don't explicitly trust with a grain of salt.

feverzsj a day ago

Didn't they only "opensource" weights like others?

Buttons840 a day ago

Weights are kind of like a compiled binary, because they are an incomprehensible blob of bits. But they are also unlike a compiled binary, because they can be fine-tuned.
- edflsafoiewq a day ago
  
  GPL defines "source code" as "the preferred form of the work for making modifications to it", which certainly describes the weights.
  - dietr1ch a day ago
    
    Training being a one-way function that drops knowledge should tell you that the weights are not the form you want to start with.
    This is like saying, hey, a regular binary executable is fine because I can edit it with hexl-mode.
    
    furyofantares a day ago
    
    If hexl-mode on the binary works on my home PC but compiling the source code costs me millions of dollars in compute then I want the binary. Someone with millions of dollars to spend on compute may have a differing opinion.
    
    TeMPOraL a day ago
    
    This argument only barely holds water for those big SOTA models like llama derivatives, and that's only because of practical costs involved.
    Or should I say, it held water until few days ago.
    Personally though, I never bought it. Saying that weights are the "preferred form of the work for making modifications to it" because a) approximately no one can afford to start with the training data, and b) fine-tuning and training LoRAs are cheap enough, is basically like saying binary blobs are "open source" as long as they provide an API (or ABI) for other programs to use. By this line of reasoning, NVIDIA GPU stack and Broadcom chipset firmware would qualify as open source, too.
    
    furyofantares 21 hours ago
    
    I just don't think analogies to open source are useful in any direction. This is its own beast and we should just think about what we want out of it.
    
    sitkack a day ago
    
    If we are able to look back at this comment in a year or two, you will chuckle.
    
    zelphirkalt 13 hours ago
    
    So, as you state yourself basically, the result also depends on training data, which makes it part of the "compiled source" in a way, just like the architecture of the model. If you have the training data, you can modify that.
    But probably it is impossible for them to release the training data, as they have probably not made it all reproducible, but live ingested the data, and the data has since then chanced in many places. So the code to live ingest the data becomes the actual source, I guess.
    
    dietr1ch a day ago
    
    Cost of building is a true concern, but it doesn't stop people from forking large open projects like Chrome or Firefox and try to build a project to pursue their own ideas and be able to contribute back to the upstream projects when it makes sense.
    I don't build my browser, it's too expensive, but the cost of building has nothing to say with how open the access to things. It'd be cool if the community could fork the project, propose changes and maybe crowdfund a training/build run to experiment.
  - fsflover a day ago
    
    https://news.ycombinator.com/item?id=42869403
- kragen a day ago
  
  I've fine-tuned compiled binaries on occasion. It used to be a common pastime among teenagers; that's where the demoscene came from.
- dartos a day ago
  
  You can decompile binaries.
  You can also edit binaries by hand.
  - behrlich a day ago
    
    Comparing fine tuning to editing binaries by hand is not a fair comparison. If I could show the decompiler some output I liked and it edited the binary for me to make the output match, then the comparison would be closer.
    
    TeMPOraL a day ago
    
    > If I could show the decompiler some output I liked and it edited the binary for me to make the output match, then the comparison would be closer.
    That's fundamentally the same thing though - you run an optimization algorithm on a binary blob. I don't see why this couldn't work. Sure, a neural net is designed to be differentiable, while ELF and PE executables aren't, but then backprop isn't the be-all, end-all of optimization algorithms.
    Off the top of my head, you could reframe the task as a special kind of genetic programming problem, one that starts with a large program instead of starting from scratch, and that works on an assembly instead of an abstract syntax tree. Hell, you could first decompile the executable and then have the genetic programming solver run on decompiled code.
    I'd be really surprised if no one tried that before. Or, if such functionality isn't already available in some RE tools (or as a plugin for one). My own hands-on experience with reverse engineering is limited to a few attempts at adding extra UI and functionality to StarCraft by writing some assembly, turning it into object code, and injecting it straight into the running game process[0] - but that was me doing exactly what you described, just by hand. I imagine doing such things is common practice in RE that someone already automated finding the specific parts of the binary that produce the outputs you want to modify.
    --
    [0] - I sometimes miss the times before Data Execution Prevention became a thing.
    
    sitkack a day ago
    
    Grab an AI model and keep going.
    
    jsight a day ago
    
    I feel like a lot of people in this thread have never done continued training on an LLM and it shows.
    Seriously, a set of weights that already works really well is basically the ideal basis for a _lot_ of ML tasks.
    
    zelphirkalt 13 hours ago
    
    The question is not, whether it is ideal to do some ML tasks with it, the question is, whether you can do the things you could typically do with open sourced software, including looking at the source and build it, or modify the source and build it. If you don't have the original training data, or mechanism of getting the training data, the compiled result is not reproducible, like normal code would be, and you cannot make a version saying for example: "I want just the same, but without it ever learning from CCP prop."
    
    dartos a day ago
    
    With regard to the argument about open source, it’s pretty much the same.
    Especially with dynamically linked binaries like many games.
    
    carom a day ago
    
    It is a fair comparison. Normal programming takes inputs and a function and produces outputs. Deep learning takes inputs and outputs and derives a functions. Of course the decompilers for traditional programs do not work on inputs and outputs, it is a different paradigm!
- sksrbWgbfK a day ago
  
  Ghidra (https://ghidra-sre.org/) can fine-tune executables way more easily than your models.
  - mistercheph a day ago
    
    Actually it can't, you can fine tune models with training data, parameters, time and compute, ghidra won't "fine-tune" anything for you.
    
    TeMPOraL a day ago
    
    How hard can it be to wrap it in a loop and apply some off-the-shelf good old fashioned AI^H^H optimization technique?
    "Given specific inputs X and outputs Y, have a computer automatically find modifications to F so that F(X) gives Y" is a problem that's been studied for nearly a century now (longer, if relax the meaning of "computer"), with plenty of well-known solutions, most of which don't require F to be differentiable.
    Isn't "operational research" a standard part of undergrad CS curriculum? It was at my alma mater.
    
    mistercheph 15 hours ago
    
    There's billions of dollars at the end of the rainbow you're gesturing towards
helpfulclippy a day ago

It's amazing to me that "open source" has been so diluted that it is now used to mean "we will give you an opaque binary and permission to run it on your own computer."
- mcbuilder a day ago
  
  Surely the architecture released as a HF transformers python file counts as "open source". https://huggingface.co/deepseek-ai/DeepSeek-R1/raw/main/mode...
  Yes training is left as an exercise to the user, but it's outlined in the paper, and a good ML engineer should be able to get started with it, cluster of GPUs not included
  - squeaky-clean a day ago
    
    To me, this feels the same as saying Sonic Colors Ultimate is open source because it was made with Godot. The engine is open source and making the game is left as an exercise to the user.
    
    mcbuilder a day ago
    
    But you have all the assets of the actual finished game as well as the code used to run it, using your example. You don't get the game dev studio, i.e. datasets, expertise, and compute. Just because someone gives you all the source code and methods they used to make a game, doesn't mean anyone can just go and easily make a sequel, but it helps.
    
    HPsquared a day ago
    
    In other words you don't have the source data.
    
    zelphirkalt 13 hours ago
    
    And full circle would be "code is data (lisp), data is code (forth)".
  - cma a day ago
    
    There was an article saying they used hand-tuned PMX instead of CUDA so it might be a bit hard to match just from the paper without some good performance experts.
    
    LiamPowell a day ago
    
    CUDA isn't so bad that hand writing PTX will give you a huge performance improvement, but when you're spending a few million dollars on training it makes sense to chase even a single digit percentage improvement, maybe more in a very hot code-path. Also these articles are based on a single mention of PTX in a paper.
    
    cma a day ago
    
    The mention is here:
    "3.2.2. Efficient Implementation of Cross-Node All-to-All Communication
    In order to ensure sufficient computational performance for DualPipe, we customize efficient cross-node all-to-all communication kernels (including dispatching and combining) to conserve the number of SMs dedicated to communication. The implementation of the kernels is codesigned with the MoE gating algorithm and the network topology of our cluster. To be specific, in our cluster, cross-node GPUs are fully interconnected with IB, and intra-node communications are handled via NVLink. NVLink offers a bandwidth of 160 GB/s, roughly 3.2 times that of IB (50 GB/s). To effectively leverage the different bandwidths of IB and NVLink, we limit each token to be dispatched to at most 4 nodes, thereby reducing IB traffic. For each token, when its routing decision is made, it will first be transmitted via IB to the GPUs with the same in-node index on its target nodes. Once it reaches the target nodes, we will endeavor to ensure that it is instantaneously forwarded via NVLink to specific GPUs that host their target experts, without being blocked by subsequently arriving tokens. In this way, communications via IB and NVLink are fully overlapped, and each token can efficiently select an average of 3.2 experts per node without incurring additional overhead from NVLink. This implies that, although DeepSeek-V3 13 selects only 8 routed experts in practice, it can scale up this number to a maximum of 13 experts (4 nodes × 3.2 experts/node) while preserving the same communication cost. Overall, under such a communication strategy, only 20 SMs are sufficient to fully utilize the bandwidths of IB and NVLink.
    In detail, we employ the warp specialization technique (Bauer et al., 2014) and partition 20 SMs into 10 communication channels. During the dispatching process, (1) IB sending, (2) IB-to-NVLink forwarding, and (3) NVLink receiving are handled by respective warps. The number of warps allocated to each communication task is dynamically adjusted according to the actual workload across all SMs. Similarly, during the combining process, (1) NVLink sending, (2) NVLink-to-IB forwarding and accumulation, and (3) IB receiving and accumulation are also handled by dynamically adjusted warps. In addition, both dispatching and combining kernels overlap with the computation stream, so we also consider their impact on other SM computation kernels. Specifically, we employ customized PTX (Parallel Thread Execution) instructions and auto-tune the communication chunk size, which significantly reduces the use of the L2 cache and the interference to other SMs."
    It's definitely not the full model written in PTX or anything, but still some significant engineering effort to replicate, from people commanding 7-figure salaries in this wave, since the training code isn't open.
- ogrisel a day ago
  
  It's better to be specific:
  - open-source inference code
  - open weights (for inference and fine-tuning)
  - open pretraining recipe (code + data)
  - open fine-tuning recipe (code + data)
  Very few entities publish the later two items (https://huggingface.co/blog/smollm and https://allenai.org/olmo come to mind). Arguably, publishing curated large scale pretraining data is very costly but publishing code to automatically curate pretraining data from uncurated sources is already very valuable.
  - Palmik a day ago
    
    Also open-weights comes in several flavors -- there is "restricted" open-weights like Mistral's research license that prohibits most use cases (most importantly, commercial applications), then there are licenses like Llama's or DeepSeek's with some limitations, and then there are some Apache 2.0 or MIT licensed model weights.
    
    cycomanic a day ago
    
    Has it been established if the weights can even be copyrighted? My impression has been that AI companies want to have their cake and it it too, on one hand they argue that the models are more like a database in a search engine, hence are not violating copyright of the data they have been trained with, but on the other hand they argue they meet the threshold that they are copyrightable in their own right.
    So it seems to me that it's at least dubious if those restricted licences can be enforced (that said you likely need deep pockets to defend yourself from a lawsuit)
    
    jcgl a day ago
    
    Then those should not be considered “open” in any real sense—when we say “open source,” we’re talking about the four freedoms (more or less—cf. the negligible difference between OSI and FSF definitions).
    So when we apply the same principles to another category, such as weights, we should not call things “open” that don’t grant those same freedoms. In the case of this research license, Freedom 0 at least is not maintained. Therefore, the weights aren’t open, and to call them “open” would be to indeed dilute the meaning of open qua open source.
    
    seberino a day ago
    
    Wait timeout. I thought DeepSeek's stuff was all MIT licensed too no? What limitations are you thinking of that DeepSeek still has?
    
    Palmik a day ago
    
    I am referring to this one: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/LIC...
    It is a bit more permissive than Llama's it seems (no MAU threshold it seems).
    
    seberino 8 hours ago
    
    Wow. Your link is frustrating because I thought everything was under the MIT license. Why did people claim it is MIT licensed if they sneaked in this additional license?
- paxys a day ago
  
  Haha yup. Going by the current definition of "open source" in AI 100% of software created before the cloud era would have been considered open source.
  - mFixman a day ago
    
    I can't believe Microsoft finally made Windows open source.
    
    zelphirkalt 13 hours ago
    
    Yay, let's "fine-tune" and share the result with everyone!
  - desdenova a day ago
    
    Every binary is open source if you can read assembly.
- hexomancer a day ago
  
  If I publish some c++ code that has some hard-coded magic values in it, can the code not be considered open source until I also publish how I came up with those magic values?
  - bityard a day ago
    
    It depends on what those magic numbers are for. If they represent pure data, and it's obvious what the data is (perhaps a bitmap image), then sure, it's open source.
    If the magic values are some kind of microcode or firmware, or something else that is executed in some way, then no, it is not really open source.
    Even algorithms can be open source in spirit but closed source in practice. See ECDSA. The NSA has never revealed in any verifiable way how they came up with the specific curves used in the algorithm, so there is room for doubt that they weren't specifically chosen due to some inherent (but hard to find) weakness.
    I don't know a ton about AI, but I gather there are lots of areas in the process of producing a model where they can claim everything is "open source" as a marketing gimmick but in reality, there is no explanation for how certain results were achieved. (Trade secrets, in other words.)
    
    Ukv a day ago
    
    > If the magic values are some kind of microcode or firmware, or something else that is executed in some way, then no, it is not really open source.
    To my understanding, the contents of a .safetensors file is purely numerical weights - used by the model defined in MIT-licensed code[0] and described in a technical report[1]. The weights are arguably only really "executed" to the same extent kernel weights of a gaussian blur filter would be, though there is a large difference in scale and effect.
    [0]: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inferen...
    [1]: https://arxiv.org/html/2412.19437v1
    
    TeMPOraL a day ago
    
    Code is data is code. Fundamentally, they are the same. We treat the two things as distinct categories only for practical convenience. Most of the time, it's pretty clear which is which, but we all regularly encounter situations in which the distinction gets blurry. For example:
    - Windows MetaFiles (WMF, EMF, EMF+), still in use (mostly inside MS Office suite) - you'd think they're just another vector image format, i.e. clearly "data", but this one is basically a list of function calls to Windows GDI APIs, i.e. interpreted code.
    - Any sufficiently complex XML or JSON config file ends up turning into an ad-hoc Lisp language, with ugly syntax and a parser that's a bug-ridden, slow implementation of a Lisp runtime. People don't realize that the moment they add conditionals and ability to include or refer back to other parts of config, they're more than halfway to a Turing-complete language.
    - From the POV of hardware, all native code is executed "to the same extent kernel weighs of a gaussian blur filter" are. In general, all code is just data for the runtime that executes it.
    And so on.
    Point being, what is code and what is data depends on practical reasons you have to make this distinction in the first place. IMHO, for OSS licensing, when considering the reasons those licenses exist, LLM weights are code.
  - mohsen1 a day ago
    
    if you publish only the binary it's not open source
    if open the source then it is open source
    if you write a book/blog about how you came up with the ideas but didn't publish the source it's not open source, even if you publish the blog+binaries
    
    mistercheph a day ago
    
    model weights != binaries
    
    fragmede a day ago
    
    why not?
    
    jay_kyburz a day ago
    
    Its like the image you generated in Photoshop released as creative commons, not the Photoshop source code.
    
    fragmede a day ago
    
    that adds to model weights == binaries tho
  - reedciccio a day ago
    
    The Open Source Definition is quite clear on its #2 requirement: `The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed.` https://opensource.org/osd
    
    ChadNauseam a day ago
    
    Arguably this would still apply to deepseek. While they didn’t release a way of recreating the weights, it is perfectly valid and common to modify the neural network using only what was released (when doing fine-tuning or RLHF for example, previous training data is not required). Doing modifications based on the weights certainly seems like the preferred way of modifying the model to me.
    Another note is that this may be the more ethical option. I’m sure the training data contained lots of copyrighted content, and if my content was in there I would prefer that it was released as opaque weights rather than published in a zip file for anyone to read for free.
    
    jonex a day ago
    
    It takes away the ability to know what it does though, which is also often considered an important aspect. By not publishing details on how to train the model, there's no way to know if they have included intentional misbehavior in the training. If they'd provide everything needed to train your own model, you could ensure that it's not by choosing your own data using the same methodology.
    IMO it should be considered freeware, and only partially open. It's like releasing an open source program with a part of it delivered as a binary.
  - z3c0 a day ago
    
    I don't know if that compares to an AI model, where the most significant portions are the data preparation and training. The code DeepSeek released only demonstrates how to use the given weights for inferencing with Torch/Triton. I wouldn't consider that an open-source model, just wrapper code for publicly available weights.
    I think a closer comparison would be Android and GApps, where if you remove the latter, most would deem the phone unusable.
- Palmik a day ago
  
  Except the "binary" is not really opaque, and can be "edited" in exactly the same way it was produced in the first place (continued pre-training / fine-tuning).
- cedws a day ago
  
  Even with the training material what good is it? The model isn’t reproducible, and even if it were you’re not going to spend the money to verify the output.
  - barnabee a day ago
    
    > The model isn’t reproducible
    Not necessarily[0], it's a WIP, but: https://github.com/huggingface/open-r1
    [0] Surely they won't end up with the exact same weights, but it should be possible to verify something about the model and approach
  - deegles a day ago
    
    I guess something like a kickstarter campaign would be needed to get together the millions of dollars needed per training run
    
    Gigachad 15 hours ago
    
    Who would fund that? What would be the point?
  - mistercheph a day ago
    
    Frontier models will never be reproducible in the freedom-loving countries that enforce intellectual property law, since they all depend on copyrighted content in their training data.
  - fragmede a day ago
    
    why not? if we could get a version of ChatGPT that wasn't censored and would tell me how to make meth, or an censored version of deepseek that wanted to talk about tank man, you don't think the Internet would come together and make that happen?
- seberino a day ago
  
  I'm not an expert but didn't they release the weights under MIT license? So you can make your own LLM with complete control right?
  I agree it would nice to know the details of their training, but, simply calling this drop an "opaque binary" is seriously underselling it no?
- blackeyeblitzar a day ago
  
  It’s because prominent people with large followings are confusing the terms on purpose. Yann LeCun of Meta and Clem Delangue of Hugging Face constantly use the wrong terms for models that only release weights, and market them to their huge audiences as “open source”. This is a willful open washing campaign to benefit from the positivity that label generates.
  - seberino a day ago
    
    I agree it would be nice to have the training specifics. Nevertheless everything DeepSeek released is under the MIT license right? So you can go set up a cloud LLM, fine tune it, and, do whatever else you wish with it right? That is pretty significant no?
    
    fragmede a day ago
    
    It is, but words mean things. If I said I got you a puppy and gave you a million dollars instead, that'd be nice, but what about the puppy?
- dartos a day ago
  
  Yeah blame the crowds of newbies calling llama open source bc it was free after being leaked.
- JumpCrisscross a day ago
  
  > amazing to me that "open source" has been so diluted
  It’s not and I called it [1].
  We had three options: (A) Open weights (favoured by Altman et al); (B) Open training data (favoured by some FOSS advocates); and (C) Open weights and model, which doesn’t provide the training data, but would let you derive the weights if you had it.
  OSI settled on (C) [2], but it did so late. FOSS argued for (B), but it’s impractical. So the world, for a while, had a choice between impractical (B) and the useful-if-flawed (A). The public, predictably, went with the pragmatic.
  This was Betamax vs VHS, except in natural linguistics. There is still hope for (C). But it relies on (A) being rendered impractical. Unfortunately, the path to that flows through institutionalising OpenAI et al’s TOS-based fair use paradigm. Which means while we may get a definition (not exactly (B), but (A) absent use restrictions) we’ll also get restrictions on even using Chinese AI.
  [1] https://news.ycombinator.com/item?id=41047269
  [2] https://opensource.org/ai/open-source-ai-definition
  - sho_hn a day ago
    
    We absolutely had a choice (D), in that no one was forced to call it "open source" at all, which was arguably done to unfaithfully communicate benefits that don't exist. This is the part that riles people up, and that furthermore is causing collateral damage outside the AI bubble, and is nothing like Betamax vs. VHS.
    If you want to prioritize pragmatism, that every discussion of this includes a lengthy "so what open source do you mean, exactly?" subthread proves this was a poor choice. It causes uncertainly that also makes it harder for the folks releasing these models to make their case and be taken seriously for their approach.
    We should probably call them "free to run", if the "it's cheap" connotation of "freeware" needs to be avoided. Or maybe "open architecture" to appreciate the Python file that utilizes the weights more.
    
    JumpCrisscross a day ago
    
    > We absolutely had a choice (D), in that no one was forced to call it "open source" at all
    Technically yes, practically no.
    You’re describing a prisoner’s dilemma. The term was available, there was (and remains) genuine ambiguity over what it meant in this context, and there are first-mover advantages in branding. (Exhibit A: how we label charges).
    > causing collateral damage outside the AI bubble, and is nothing like Betamax vs. VHS
    Standards wars have collateral damage.
    > We should probably call them "free to run", if the "it's cheap" connotation of "freeware" needs to be avoided. Or maybe "open architecture"
    Language is parsimonious. A neologism will never win when a semantic shift will do.
    
    sho_hn a day ago
    
    > Language is parsimonious. A neologism will never win when a semantic shift will do.
    Agreed, but I think it's worth lamenting the danger in that. History is certainly full of transitory calamity and harm when semantic shifts detach labels from reality.
    I guess we're in any case in "damage is done" territory. The question is more about where to go next. It does appear that the term "open source" isn't working for what these folks are doing (you could even argue whether the "available" term they chose was a strong one to lean on in the first place), so we'll see what direction the next shift takes.
    
    JumpCrisscross a day ago
    
    > we're in any case in "damage is done" territory. The question is more about where to go next
    Sort of. We can learn from the example. Perfect is the enemy of the good.
    
    nightski a day ago
    
    The source code is absolutely open which is the traditional meaning of open source. You are wanting to expand this to include data sets, which is fine, but that is the divergence.
    
    JumpCrisscross a day ago
    
    > source code is absolutely open
    It’s ambiguously open.
    
    HPsquared a day ago
    
    Data is code, code is data.
    
    lyu07282 a day ago
    
    Nonono the code for (pre-)training wasn't released either and is non trivial to replicate. Releasing the weights without the dataset and training code is equivalent of releasing a binary executable and calling it open source. Freeware would be more accurate terminology.
    
    seberino a day ago
    
    I think I see what you mean. I suppose it is kinda like an opaque binary, nevertheless, you can use it freely since all is under the MIT license right?
    
    lyu07282 a day ago
    
    Yes even for commercial purposes which is great, but the point of and reason why "open source" became popular is that you can modify the underlying source code of the binary which you can then recompile with your modifications included (as well as selling/publishing your modifications). You can't do that with deepseek or most other LLMs that claim to be open source. The point isn't that this makes it bad, the point is we shouldn't call it open source because we shouldn't loose focus on the goal of a truly open source (or free software) LLM on the same level than chatgpt/o1.
    
    nightski a day ago
    
    You can modify the weights which is exactly what they do when training initially. You do not even need to do it in exactly the same fashion. You could change things such as the optimizer and it would still work. So in my opinion it is nothing like an opaque binary. It's just data.
    
    lyu07282 a day ago
    
    We have the weights and the code for inference, in the analogy this is an executable binary. We are missing the code and data for training, that's the "source code".
    
    JumpCrisscross a day ago
    
    > that's the "source code"
    Then it’s never distributable and any definition of open source requiring it to be is DOA. It’s interesting, as an argument against copyright. But that academic.
    
    fragmede a day ago
    
    it's not academic. Why can't ChatGPT tell me how to make meth? why doesn't deepseek want to talk about tiananmen square? what other things has the model been molested into how it should be? without the full source, we don't know
    
    cycomanic a day ago
    
    While I appreciate the argument that the term "open source" is problematic in the context of AI models, I think saying the training data is the "source code" is even worse, because it broadens the definition to be almost meaningless. We never considered data to be source code and realistically for 99.9999% of users the training data is not the preferred way of modifying the model, just because the don't have millions of $ to retrain the full model, they likely don't even have the HDD space to save the training data.
    Also I would say arguing that the model weights are just the "binary" is disingenuous, because nobody wants releases that only contain the training data and scripts to train and not the model weights (which would be perfectly fine for open source software if we argue that the weights are just the binaries), because they would be useless to almost everyone, because they don't have the resources to train the model.
culi a day ago

No its fully open sourced. Even Janus is.
https://github.com/deepseek-ai
More importantly, they spelled out their methodology in depth in a paper (the code/implementation is trivial in comparison to the methodology)
- Philpax a day ago
  
  If it's fully open source, where's the code for training it? The implementation - at least, theirs - is also not trivial as they've mentioned optimising below the CUDA level to get maximum throughout out of their cluster.
  I'm very appreciative of what they've done, but it's open weights and methodology, not open source.
- aldanor a day ago
  
  That's just inference code.
ComputerGuru a day ago

They “open sourced” it enough (via the whitepaper) that huggingface is trying to reproduce their training now.
- dartos a day ago
  
  How is it open source at all with no source?
  Paxos isn’t open source just because you can read the paxos paper.
  - nexus_six a day ago
    
    For people who have the disciplinary background in neural networks and machine learning I imagine that replicating that paper into some type of framework would be straight forward right? Or am I mistaken?
    
    jampekka a day ago
    
    The model itself yes. The changes from previous architectures are often quite small code-wise. Quite often just adding/changing few lines in a torch model.
    Things like tweaking all the hyperparameters to make the training process actually work may be more tricky though.
  - ru552 a day ago
    
    With an LLM, the actual 0s and 1s of the model are fairly standard, common, and freely available to anyone that wants to use them. The "source code" for an LLM, is the process used to create the outcome, and to an extent, the data used to train with. DeepSeek released a highly detailed paper that describes the process used to create the outcome. People/Companies are actively trying to reproduce the work of DeepSeek to confirm the findings.
    It's more akin to scientific research where everyone is using the same molecules, but depending on the process you put the molecules through, you get a different outcome.
    
    dartos a day ago
    
    > With an LLM, the actual 0s and 1s of the model are fairly standard, common, and freely available to anyone that wants to use them
    How is that different than the 0s and 1s of a program?
    Assembly instructions are literally standard. What’s more, if said program uses something like Java, the byte code is even _more_ understandable. So much so that there is an ecosystem of Java decompilers.
    Binary files are not the “source” in question when talking about “open source”
    
    fuzzbazz a day ago
    
    There is no way to decompile an LLM's weights and obtain a somewhat meaningful, reproducible source, like with a program binary as you say. In fact, if we were to compare both in this way that would make a program binary more "open source".
    
    dartos a day ago
    
    Yes, that is my exact argument.
  - ComputerGuru a day ago
    
    As an actual FOSS developer: they didn’t open source it.
    But I was merely adding the missing context using the (sorry) lingua Franca of AI.
  - og_kalu a day ago
    
    They released the weights
- jayd16 a day ago
  
  Publishing a white paper doesn't qualify as open source in any other context.
  Google Spanner has a nice white paper but you wouldn't consider it open source, for example.
tim333 a day ago

No. Basically everything. It's not even GPL, it's MIT license. See https://news.ycombinator.com/item?id=42768547
marcosdumay a day ago

Weights are actually all you have. The "Open Source" name never applies to LLMs because they don't have a source.
But China did distribute them with sharing-friendly terms, what is completely different from others, like Meta, and makes the name way less misleading this time.
- frontfor 21 hours ago
  
  Stop referring to them as “China”. It’s as ridiculous as referring to Meta as “America”.
mritchie712 a day ago

yes, all the training code is still closed and doesn't seem it will ever be released. Here's[0] a comment from a dev that worked at deepseek.
tldr: we're already on to the next model, don't expect anything else to get open sourced.
> I was just told that the amount of people there are too limited, and open-sourcing needs another layer of hard work beyond making the training framework brrr on their own infra. So their priority has been to open-source everything that is MINIMUM + NECESSARY to the community while pushing most efforts on iterating to the next generation of models I think. They have been write everything clearly in technical reports and encourage the community to engage in reproduction , which is the unique insight of the team as well I think.
0 - https://x.com/wzihanw/status/1884374329334387017
tarsinge a day ago

Others? Do OpenAI, Google or Anthropic release weights?
badgersnake a day ago

Yeah, it’s not opensource. It’s just not SaaS. We need to call out these AI companies more on this.
- otterley a day ago
  
  We have a term for this: "freeware."
  - badgersnake a day ago
    
    Sure, call it that then. Cut the open source bollocks.

garspin a day ago

An alternative explaination...

Deepseek is a side project for a hedge fund.

Shorting NVIDIA & releasing everything including the source would have a high probability of being hugely profitable, with almost zero downside if it went unnoticed.

ukoki 12 hours ago

That would be an interesting 'insider trading' work-around: use inside knowledge of a private, non-US company to trade correlated stocks in public US companies.
If it was a public US company releasing Deepseek, and you shorted nvidia based on inside knowledge, presumably you'd fall foul of insider trading rules?
sinuhe69 16 hours ago

An interesting spin. But not an easy feat in any way. Yes, if you see the benchmark results, you can start speculate about its impact on the stock market
But it’s a consequence of your highly concentrated effort on the primary project (DeepSeek), not a side project.

tgtweak a day ago

Wasn't there an internal google email or memo that stated as much as well? That open source was moving faster and more efficiently than the best private teams and that it was accelerating - basically calling this out about 18 months early?

[1] https://www.artisana.ai/articles/leaked-google-memo-claiming...

serial_dev a day ago

There was, but as I understand it, it’s really just one dude’s opinion.
No team or consensus behind it, so it’s not really news in my opinion. In fact, I’d be surprised if nobody in Google would believe in open source AI.
The take of course interesting and as an open source guy, I like it and hope he is right.

jsemrau a day ago

The future of LLMs is shared research and that's the part I really like. It's ok, in my opinion, if not everything is shared, but this is too important to be in one company.

zbshqoa a day ago

That shouldn't be the premise of a company that has "open" in its name as well?

igorguerrero a day ago

Not here, here we lick the boot of sama to get crumbs for our startups ;-)

kidsil a day ago

Linux won in the long run, I don't see why robust LLM models won't do the same.

In the end it'll be the scale of the infrastructure itself that will make the difference.

maxloh a day ago

It is a different landscape IMO.
The model's source code (the training data) is hundreds of GB and much harder to transfer. The compiling (training) process is also very costly. This is very different from the Linux case.
Only big techs have enough resources to make these things happen.
I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
https://news.ycombinator.com/item?id=41952722
- aldanor a day ago
  
  You're off by quite a few orders of magnitude in regards to the size of the training data...
  Another point being, who knows if they really have legal rights to use all that data for training.

hhthrowaway1230 a day ago

You cant reproduce this model from the source, because the source isnt given, the result is given. hence not open source

BoorishBears a day ago

I usually ignore people who say this because it guarantees they're not actually doing anything meaningful with these models and just want to squibble over semantics from the sideline...
https://huggingface.co/blog/open-r1
They didn't just toss model weights over a fence, they shared exactly how to do what they did. They made a meaningful contribution that people are replicating with other models readily.

mirawelner a day ago

I think at the end of the day the reason that the opensourced DeepSeek is because they are programmers. Programmers like to show people the cool stuff they did. I had a boss who was rich enough to retire but was working three jobs because programming is cool and fun and he wants to do cool and fun things and show people the stuff that he did.

Everybody is trying to come up with a money related reason for why they open sourced it but at the end of the day the people who made it are engineers and not buisnesspeople. DeepSeek is really freaking cool, and they wanted to show people the cool thing they did.

danielbln a day ago

This take is especially funny when you realize that Deepseek is part of HiFlyer, a quant fund. Doesn't get more "money" than that.
zbendefy a day ago

I dont think they rent gpus for $5million because its cool and want to show the world...
aldanor a day ago

All the cool stuff they did, if any, is arguably in the training code which is not open source.

tempeler 17 hours ago

Some benefits of US sanctions. The only way for China to spread in the West or non-West is to be open source. US monopolies may prevent competition in the US, but I don't think other countries would be willing to join in. They are actually doing a favor for the rest of the world. Will open source become cheap hardware thanks to the hardware wars? Are they trying to make everyone dependent on China by being paranoid like this? What are they doing? The US is going in the wrong direction. The inefficient ones should say goodbye to the market and continue with the efficient ones. They seem to be going after the monopolies and bringing about their own end.

serverlessmania a day ago

It's not open source, we have no idea about the data used to train the model, and the paper doesn't explain it all.

pedalpete a day ago

Is this an important consideration in open sourcing an AI model?
I would think the code to build your own is open sourced, and you can feed it any data you'd like. That's the open source part, not the part where they are running the model.
Have I misunderstood this?
- kelipso a day ago
  
  It’s a common complaint on open sourced ML models that they don’t provide or describe the data used to train the model. Sometimes it’s a valid complaint, since it may not be clear what kind of data was used to train the model, and sometimes it’s not since it’s clear.
  I think it’s kind of an overdone complaint and I usually ignore it, and besides it looks like there’s a huggingface project ongoing where they’re trying to replicate the training process for this model anyway.

kodzoman a day ago

Lago doesn't seems to be really open source since it doesn't even support basic features like credit notes in the free version.

bufferoverflow a day ago

And who will pay for all the expensive AI hardware? We're getting into the crazy phase of hundred billion dollar data centers.

Just because R1 was trained cheaply, doesn't mean that this architecture cannot be trained on a very expensive data center to get much better and bigger models.

visarga a day ago

R1 stands out not just because of efficient training, but because it created its own training data. Works similar to AlphaGo - it tries to solve problems, and has a way to check when the result is correct. The trick is to let it run more, to make better training data. I bet those datacenters will work more on problem solving than training.

apples_oranges a day ago

“We Have No Moat, And Neither Does OpenAI” - Google

xnx a day ago

Of any company, Google has the largest moat: 1) Google AI datacenters built around TPUs are much more efficient than anything Nvidia based 2) Google has the crawl infrastructure and experience to continually get the freshest data 3) Google has lots of paid and voluntary training data from users
- brap a day ago
  
  Most importantly, Google has the userbase to rule all userbases. I’d argue that for over 90% of people online, Google is their gateway to information. Whether it’s search, Chrome, or Android.
  Not to mention countless other popular apps that Google has. YouTube anyone?
  They’re also the most well positioned company to profit from cheap AI, their ads network is a behemoth.
  So yeah, add that up with the compute, the data, and the talent, and it’s pretty clear that Google is not a force to dismiss.
  If anything I think DeepSeek is great news for Google.
- danieldk a day ago
  
  On the other hand, Google also stands to lose a lot. People will replace a portion of their search queries with LLM chat.
  - SkyPuncher a day ago
    
    I already have. Never thought I would, but Google search results are literally unusable for me.
    Not to mention, LLMs are way better at synthesizing multiple sources into coherent response. I end up asking and LLM then searching only as secondary research.
    
    aerhardt a day ago
    
    > Google search results are literally unusable for me
    Totally on the same boat. Information is just much harder to find and friction becomes higher. I'd rather deal with the occasional hallucination than with the utterly enshittified SERP experience.
  - luma a day ago
    
    That's only a problem if you presume that Google cannot figure out a way to monetize being the second brain that you offload a lot of cognitive tasks to. Hey google I'm hungry, ok how does Pizza sound? Great, make it so. OK, sending an order to pizza-company-that-paid-Google.
    They stand to tap into something far more powerful than advertising if they can position themselves as your agent.
  - xnx a day ago
    
    I certainly have replaced many of my Google Searches with Gemini chats
    
    rurban 17 hours ago
    
    Because they didn't start the enshittification of Gemini chat results yet, where you have to wade through paid ads spam, similar to hearing politicians talk.
    That's the real advantage of the open deepseek weights. They cannot enshittify this, you can run it locally. Just with an old snapshot
dauertewigkeit a day ago

It ironically seems like a very similiar market to internet search. There was no moat there either, other than the capital needed to bankroll a better search engine. A lot of these AI companies will eventually fail (not because their models will be significantly worse but because of failure to commercialize), the market will consolidate with only a couple of players (maybe two in the US, one in China and maybe one in Russia). And once that happens the idea of raising enough capital and building a competitive AI company will seem impossible. Exactly like what transpired with internet search after Google won most of the market.
- mjburgess a day ago
  
  Oof, no -- it's quite the opposite, much to the likely collapse of google in the future.
  Holding exabytes of data to be processed on commodity hardware to enable internet-wide search, all the while it was man-in-the-middle monetised by an ad-business, created tremendous moats. Entering that market is limited to tech multinationals, and they have to deliver a much superior experience to overcome them. To perform a google search you need google-sized data-centres.
  Here we have exactly the opposite dynamics: high-quality search results (/prompt-answers) are as-of-now incredibly commodotized, and accessible at-inferecence-time to any person who has $25k. That's going to be <= 10k soon.
  And innovation in the space has also gone from needing >1Bn to <=50Mil
  A higher quality search experience is available now at absolutely trivial prices.
  - stuartjohnson12 a day ago
    
    That's only because LLMs haven't been a target until now. Search worked great back before everything became algorithmically optimised to high hell. Over time, the quality of information degrades because as metric manipulation becomes more effective, every quality signal becomes weaker.
    Right now, automated knowledge gathering absolutely wipes the floor with automated bias. Cloudflare has an AI blocker which still can't stop residential proxies with suitably configured crawlers. The technology for LLM crawling/training is still mostly unknown, even to engineers, so no SEO wranglers have been able to game training data filters successfully. All LLMs have access to the same dataset - the internet.
    Once you:
    1. Publicly reveal how training data is pre-processed 2. Roll out a reputation score that makes it hard for bots to operate 3. Begin training on non-public data, such as synthetic datasets 4. Give manipulated data a few more years to accumulate and find its way into training data
    It becomes a lot harder.
- fourside a day ago
  
  Google did absolutely have a moat on internet search. It wasn’t just about bankrolling a an alternative as Microsoft proved time and time again.
jszymborski a day ago

In fairness, it's at least a $5.6M moat at the moment, which is not exactly "no moat" but it is demonstrably not an insurmountable one, and it might become more shallow yet with time.

nokun7 a day ago

While open-source LLMs offer transparency and community-driven innovation, the future might not be exclusively OSS. Proprietary models have significant advantages, including the ability to secure investment for cutting-edge development, customize for specific business needs, and maintain competitive edges through secrecy. Moreover, companies can directly monetize proprietary models, providing a clear path to profitability, and they can offer enhanced security and privacy controls crucial for sensitive applications. Thus, both open-source and proprietary LLMs are likely to continue playing vital roles in AI's future landscape.

spaceribs a day ago

While I'm sure there are monetary benefits of making your LLM proprietary, I'm not sure there's a benefit to extending someone else's proprietary LLM.

9cb14c1ec0 a day ago

I am running Deepseek R1 on my AMD Ryzen 7 PRO 5850U integrated GPU. While my experience will R1 doesn't make me think well of it, it is impressive how fast it is on such a weak graphics processor.

zbendefy a day ago

Note: you are probably running a distilled version of R1, which is actually LLama or Qwen further trained on the input/output of R1.
The full R1 is huge (~700GB), altough there are still quantized versions, the smallest one is around 150gb (1.58bit)
- 9cb14c1ec0 a day ago
  
  Oh, that's interesting. I didn't know that the ollama version wasn't the whole thing.
  - postalrat a day ago
    
    ollama deepseek-r1:671b is
mmoskal a day ago

You're most likely running a destilled version. The full model is ~700GB.
Fergusonb a day ago

The default model on ollama is the 7b distillation.
Its ability to solve basic math problems with reasoning is pretty cool, but other models of that size (qwen 2.5, phi4) have been generally more useful to me.
These tiny models still strike me as toys, not a whole bunch of real-world utility.
- unethical_ban a day ago
  
  Yeah phi4 has been as good or better for me than r1-qwen 32b for general queries

basileafe a day ago

Remember when Open AI CTO squirms In response to a question about using data from YouTube? https://digg.com/digg-vids/link/open-ai-ceo

OpenAI's CTO, Mira Murati, found herself in a tight spot when questioned about using YouTube data to train Sora. Her uncertain response has sparked controversy and raised concerns about their ethics in collecting and training data. This incident has fueled a growing debate about AI companies' data practices.

Then YouTube's CEO, Neal Mohan said, if OpenAI used YouTube content without permission, it would violate their terms of service. Shall Neal freakout like how they are now!! Clearly they are scared, they know people are canceling their subscriptions with them to and use free and better technologies. I know of 100 of people canceled their gpt subscription. Many developers are replacing the expensive gpt models for free deepseek.

Here is the AI current story:

Imagine two AI trains chugging along the tracks of innovation. The first, driven by OpenAI, was the early leader, after they using Google transformers (and without they wouldn't exist). They charged a hefty fare for anyone to hop aboard. We don't know how they trained their data. And big companies felt they had to buy tickets or risk being left behind. OpenAI thought they were the only engine in town. But then, another train pulled up alongside them. This new locomotive, powered by smart folks at DeepSeek, matched OpenAI's speed and fancy gadgets, if not better. The kicker? Everyone could ride for free!

Now, OpenAI's train is losing steam. People are jumping ship, with hundreds canceling their pricey GPT subscriptions. Meanwhile, the free train is picking up speed, aiming to make AI available to all.

In this tale of two trains, OpenAI might need to change their name to "ClosedAI" if they keep putting up barriers, being closed. The free and open train? That's the one chugging towards a brighter, better, free AI future for everyone.

deepseek = Open AI

delgaudm a day ago

OT, but whoa... Digg. There is a name from the past.
- wholinator2 a day ago
  
  Right? That's so strange. I went for a look around and saw "articles" with apparently tens of thousands of "reads" but 0 comments. I don't know if they're locked behind a login or what but it feels like something is off there
- exe34 a day ago
  
  Now that's a name I have not heard in a long time...
maybelsyrup a day ago

> In this tale of two trains, OpenAI might need to change their name to "ClosedAI" if they keep putting up barriers, being closed. The free and open train? That's the one chugging towards a brighter, better, free AI future for everyone.
This answer itself is an AI product, right? Like you're making a meta-point about something
alecco a day ago

Deepseek is remarkable, but they explicitly say they built on top of Meta's Llama and Alibaba's Qwen. They scored the goal but there were other players involved to get there.
- boroboro4 a day ago
  
  I feel like you confuse this with their distill models (i.e. 1.5/7/8/32/70B) being build on top of Llama & Qwen models. But those aren't really remarkable models.
  Truly remarkable model is DeepSeek-R1, and it's their model, with very particular DeepSeek architecture. Of course they build on the knowledge of other labs, just like other labs build on the top of their/others knowledge. They are miles ahead of Meta in terms of the base architecture at the moment, and you can watch them iterating throughout last year to come to where they are now.
- rtkwe a day ago
  
  That's true of practically everything though. Completely out of the blue technical inventions are pretty rare.
joshl32532 a day ago

> I know of 100 of people canceled their gpt subscription.
Did you make a poll or something?

herval a day ago

DeepSeek's gambit proves that as much as Stable Diffusion proved that the future of Diffusion Models is open-source. In other words, it doesn't prove anything

aprilfoo a day ago

The current AI mega-buzz, fueled by fascinating technologies, finance and even geopolitics makes it difficult to have a serious analyze beyond opinions and reactions. But the shock waves of that announcement by a small tech company are quite interesting.

> In fact, making it easier and cheaper to build LLMs would erode their [OpenAI, Meta, Google etc] advantages!

The narrative until now was: AI requires enormous and cutting edge resources (money, energy), so only for the big boys and people who can talk multi-billions investments, so open source was not an option.

Some signs already appeared recently (plateau, bubble?), and Deepseek seems to show that this model is questionable.

whatever1 a day ago

Open sourcing Llama just ensured that openAI will not create a dominant ecosystem that will attract most of the organic web traffic.

METAs bet paid off, but at what cost.

varsketiz a day ago

What is the cost you imply?
- __MatrixMan__ a day ago
  
  It got harder to lie to the world about what's possible. Oh dang.

jaharios a day ago

As I see it, what most big players are hunting is new data. Gaining trust is important, being the new thing big thing is also good. Being seen as "open" (not open source) makes other think you have nothing to hide and good intentions.

hsuduebc2 a day ago

Nothing was proven. It's just an empty statement. I would guess that future llms would be largely based on these which are open sourced today but products which would be most usef would be held proptietary. For end user is main argument convenience and ease of use.

Exactly how it happened in operation systems.

bityard a day ago

> but trained on inferior hardware for a fraction of the price

Do we know that this is actually true?

mmaunder a day ago

The argument re OpenAI continuing to lead falls flat when you consider the talent they've lost. It's a different company compared to the one that built and launched GPT-4.

SathyaQuikFlip a day ago

DeepSeek shows proof that all models can be equally as good as each other, and that the best models will eventually be open-source. I believe it's a good thing for our world.

liminal a day ago

I'd love to see the training data open sourced for all models so we can be sure no copyright material has been used. Just kidding, we all know it's stolen.

visarga a day ago

> Just kidding, we all know it's stolen.
This thief has small pockets, about 500x smaller than the "stolen" material. Where to stash all that?
- forty a day ago
  
  They kept the jewelry and throwed the rest?

varsketiz a day ago

Sorry for possibly a stupid question, but what is the license for commercial use? If I want to run R1 in my DC, build a product on top and charge people for it. Is it MIT?

alalv a day ago

Yes, it is an MIT license

alecco a day ago

This is quite bad blogspam appealing to the open source crowd. They didn't even bother to read a bit.

Deepseek is open source because the founders are part of the new generation of Chinese graduates who relate more to the global youth than Boomer Chinese CEOs completely out of touch. And right on time because CCP is fed up with them, too.

Last week Deepseek founder Liang Wenfeng was speaking practically face to face with Chinese Premier Li Qiang at a symposium: https://www.youtube.com/watch?v=zMyc3vhpLyI. And they seem to be quite aligned.

Why didn't this blogspam of an article pick up on any of that?

https://news.ycombinator.com/item?id=42852266

ge96 a day ago

Tin foil hat, anyone run it and use wireshark to see if it doesn't make external requests (unless it had to like a browser agent)

exitb a day ago

It’s just numbers. You use other open source software to run it.
- paxys a day ago
  
  It's not even "can". You have to use your own software to run it. DeepSeek hasn't published anything other than the model weights.
- NBJack a day ago
  
  The model is. How it is packaged is a different matter entirely. There is a good reason we saw a shift towards the safetensors format.
  https://arjancodes.com/blog/python-pickle-module-security-ri...
ru552 a day ago

It's been confirmed to run on a machine with no internet access. So it isn't reliant on external requests, though it could still be trying to make them.
nexus_six a day ago

I've done this (not thoroughly by any means) with OpenSnitch on the Ubuntu machine I have ollama installed on running the 32b R1 weights. No network traffic.
I'm not entirely sure if it is possible to do some type of code execution like that in just the weights themselves, though someone else who knows a bit more about this can weigh in here.
marcodiego a day ago

Isn't DeepSeek simple/small enough you can run it locally?
- zbendefy a day ago
  
  No, the full R1 model is ~650GB. There are quantized version that quantize it down to ~150GB.
  What you can run locally are the distilled models, that is actually LLama and Qwen weights further trained on R1's output
- deepsquirrelnet a day ago
  
  At least a TB of VRAM to load it in fp16. They distilled to smaller models, which do not perform as well, but can be run on a single GPU. Full R1 is big though.
  - nickthegreek 7 hours ago
    
    fp16? I thought it was trained at fp8.
derektank a day ago

Yeah, I would want to double check and confirm they're using safe serialization methods at the very least before using the weights from any model released by a Chinese entity

swyx a day ago

meta question: hey Anh! how come you stopped blogging on your github? i thought that was working for you.

mritchie712 a day ago

can't use koala and clearbit on github
``` <script async="" src="https://cdn.getkoala.com/v1/pk_963cd5673bdab99d6452d82210e66... ```

titzer a day ago

I, for one, abhor the idea of megacorps running models and AI as a service as they do now. If nothing else, the internet proved to us that an absolute gold mine of technological value can and will be enshittified to the point of unusability when it is cornered by Big Tech. I shudder to think of models trained specifically to convince people to buy things--and I am looking directly at Big Tech's advertising model as one of the worst possible incubators for this technology.

Don't forget to drink your Ovaltine.

1970-01-01 a day ago

>Does that mean proprietary AI is done? No.

Perfectly stated.

The AI jump to conclusions mat is so worn down, it's become paper thin. The shock of DeepSeek's costs does not auto-magically force all LLMs to become opensource. Silicon Valley tech has always favored whomever delivers inside the trifecta of cheaper, better, faster triangle. Anyone with an MBA should know this includes open-source LLMs. As of today, DeepSeek is ahead. As soon as OpenAI answers with a new 'fastfood dollar menu' for ChatGPT, with 'even more special' secret-sauce ingredients, we're going to see them back to normal business.

BoorishBears a day ago

a) As soon as I saw the domain I knew this was an ad (Lago has nothing to add to this conversation)

b) DeepSeek is the most dangerous thing that's happened to Open Source models in recent memory, through no fault of their own.

The hysteria has outrun the reality and now there's a going to be a similarly disproportionate backlash.

It's already happening: Anthropic's CEO simultaneously railing against what they achieved and using it to justify stronger export restrictions, this morning.

And our current government doesn't want to be going on stage talking about $50B mega projects only for laypeople to (mistakenly) believe it only takes a few million to do the same.

And the idea that a Chinese company is the one that did this is going to play into so many hands, so perfectly. You can see the censorship story start taking the narrative despite this not being the first or last Chinese hosted model to comply with Chinese law.

Soon the national security angle will break out, especially if someone jailbreaks or abliterates it and gets "harmful outputs" that other models would also happily produce.

Some will couch the (very temporary and irrational) dip the market faced as a Chinese company managing to harm our markets by providing an unfairly priced product or some nonsense.

Open source AI is not guaranteed. We might still see protectionist bans against releasing models over a certain size and other irrational nonsense, and this has played into the kind of hysteria that allows that to happen.

soheil a day ago

Why are people so willing to believe false proofs/headlines? Clickbait has existed for decades yet I still believe people are as gullible as the first day. Articles like this and from sites like phys.org are great examples of the case in point they regularly get hundreds of upvotes based on completely ridiculous and false promises.

Always been fascinating to me how often rhetoric wins over substance on hn.

shahzaibmushtaq a day ago

It's DeepSeek low-price, low-investment reasoning models that has sent a shockwave around the world.

> Compare $60 per million output tokens for OpenAI o1 to $7 per million output tokens on Together AI for DeepSeek R1.

Open-source isn't a primary rational aspect to prove anything. We can't even prove what's going to happen tomorrow, proving a statement that is linked to the future is utter nonsense.

chrchr a day ago

As long as entrenched Google, Meta and the Chinese Communist Party can use Open Source LLMs to kneecap upstart rivals, I agree. Once the upstarts are neutralized, the open LLMs will stop.

niyyou a day ago

Again. This. is. not. Open-source. At best, open-weights. Clickbait 100%.

CooCooCaCha a day ago

Yes and no. Intelligence scaling with compute makes sense so I doubt the advantage of closed models on large compute clusters will ever truly go away.

But that doesn’t mean smaller models aren’t useful.

rvz a day ago

As too easily predicted. [0][1]

Frontier AI model SaaS companies like OpenAI can never win the race to zero against $0 free or open source AI models as they are already at the finish line.

[0] https://news.ycombinator.com/item?id=35177606

[1] https://news.ycombinator.com/item?id=35661548

fuddle a day ago

This looks like another LLM generated article.

gnarlouse a day ago

Yeah no, it makes way more sense that it's an attack by the chinese government on the US economy.

prjkt a day ago

Source:

- Training SW [x]

- Inference SW [x]

- Evaluation SW [x]

- Data [x]

Output:

- Weights []

DeepSeek is closed-source with *open-weights*

culi a day ago

DeepSeek V3 and even Janus has all software open sourced and R1 should be fully open sourced as well soon. More importantly, they explicitly spelled out their methodology in a published paper for DeepSeek R1. Implementation is not as important imo but we'll get that soon as well
https://github.com/deepseek-ai

danjl a day ago

Click bait headline. Nothing was proven about open source as the future. "To gain a foothold in Western markets, DeepSeek had to open-source its models." This is an opinion, not the only solution. DeepSeek could have remained proprietary just as easily. The bias of the author becomes clear at the end when he starts to promote his own open source company. Everyone has an agenda.

dang a day ago

Ok, we've changed the title above to be that of the article.
(Submitted title was "DeepSeek proves the future of LLMs is open-source".)
burrish a day ago

Yep it's just an ad disguised as an article.
Lot of them on the internet trying to help user with basics windows things, then they suggest their app as a better alternative.
- mbb70 a day ago
  
  Eh, it's not disguised as anything, it's content marketing. Lago the API billing solution is not suggesting their product as a better alternative to ChatGPT.
- noname120 a day ago
  
  Is the member who posted related in any way to Lago?
  - that_guy_iain a day ago
    
    I'll almost certainly be Anh-Tho the CEO.
    
    throwup238 a day ago
    
    I went looking for their team on the About Us page and instead found this:
    > 10x top of HN
    > Billing remains a major issue for companies, resonating widely. We've consistently hit HackerNews' top page over 10x. [1]
    [1] https://www.getlago.com/about-us
    
    noname120 10 hours ago
    
    Good thing that I flagged the post then (and sad thing that not enough people did for it to appear flagged)
    
    that_guy_iain a day ago
    
    To be fair, their ability to target HN is very good. I'm always impressed with their marketing. They put a good title that resonates with HN while the original title is something else for SEO.
- that_guy_iain a day ago
  
  While it's clearly content marketing aimed at the hype of DeepSeek, it only mentions Lago in a single sentence.
  It's just an article that is aimed to get you to hear about Lago, star their GitHub repository and eventually talk about the "open source" billing tool you heard about called Lago.
  (I put Open Source in quotes because I think it's open source version is just Freeware with most features being Call To Action to a paid version. Fair disclosure I have https://github.com/billabear/billabear which is a competitor)
- eclipxe a day ago
  
  Most articles are simply ads. Attention economy.
  - kingkawn a day ago
    
    Most comments are empty generalizations.
- behnamoh a day ago
  
  flag it then.
jerf a day ago

Steelmanning the idea in general, this can be a form of commoditizing your complement: https://gwern.net/complement
Now, commercially, this may not make a lot of sense at the moment because no one is getting filthy rich on high-moat AI-using applications such that commoditizing the AI itself is a good idea commercially. I'm not sure anyone would even be confident enough to be the farm on the idea of someday being in that position.
However, if you analyze this from the perspective of world politics, where both explicit and implicit strategies are based on what tech companies have what tech and where it is located, it makes a lot of sense that if China is concerned that the US really is ahead in AI tech and that US financial and technical dominance is being driven by this dominance and being used to suck capital out of the countries that are behind, it makes all kinds of sense to commoditize the complements as basically a way of throwing the current game board up in the air and restarting again.
(One may also note that this analysis also says that just straight-up stealing the OpenAI tech and slightly AI-washing it before handing it out to everyone is also a logical move. I don't know enough to have any independent opinion as to whether that's where DeepSeek came from. I'm just saying that given the visible circumstances it is a strong strategic move for China at this point.)
- dleeftink a day ago
  
  > from the perspective of world politics
  Much of this purported strategy hinges on 'winning' at all cost by undermining the lead.
  What is there to be won at the end? Does one party taking the reigns prevent the other from achieving similar capabilities? Is it necessary to win this race?
  Or is this a cumulative, distributed effort that benefits all of us?
  - jerf a day ago
    
    Whether it's true or not, world leader's ears are being filled with the claim that whoever wins the AI race wins everything, because AI will be able to win every other contest. They're being told it is winner-take-all like no contest has been winner-take-all before.
    
    jcgrillo a day ago
    
    And they're all stupid enough to actually believe it? Why would a world leader listen to anyone in tech? They should ask an actual expert.
    Edit: to be clear, what I mean is that to a first approximation technologists are charlatans and frauds. If you're looking for accurate information ask a scientist.
    
    michaelt a day ago
    
    From a politician's perspective, scientists are like gold prospectors digging holes seemingly at random. $100 billion startups are what you get when the prospectors strike gold.
    Why would you discuss gold with the wild-haired eccentric at the bottom of a hole, who has not yet found any gold, when you could talk to a gold mine owner who has - and who employs 1500 voters, and who like you wears a suit and tie?
    
    nightski a day ago
    
    They did. Yann LeCun and several other prominent researchers testified before congress.
    
    jcgrillo a day ago
    
    I missed this, did they actually tell congress this is some kind of consequential winner takes all race with dire consequences for losing?
    
    JumpCrisscross a day ago
    
    > world leader's ears are being filled with the claim that whoever wins the AI race wins everything, because AI will be able to win every other contest
    This describes a narrow slice of Silicon Valley numpties.
    World leaders see an economic opportunity. Both to spend and to produce. No politician will turn down the opportunity to announce half a trillion dollars of spending.
  - JumpCrisscross a day ago
    
    > Or is this a cumulative, distributed effort that benefits all of us?
    This. There are a few theories of geopolitics, one of the most successful being ones we be bunch under an umbrella called realism [1]. (The others are idealism [2] and liberalism [3]. Historia Civilis made a great three-part video series on these [4]. Note that Realpolitik [5], which relates to realism as its praxis, is not the same thing.)
    One of the consequences of realism is balance of power theory, which “suggests that states may secure their survival by preventing any one state from gaining enough military power to dominate all others” [6].
    What is to be won? Not being dominated; ideally: less war, since war is irrational. (See: Ukraine.) Does preventing others from dominating you prevent you from dominating others? No. Is it necessary to win? No. But that means ceding sovereignty and increasing the chances of violent conflict as geopolitical fault lines reälign.
    A note on liberalism: it works. But it requires great power at its centre. America was that benevolent great power. Now it seems we don’t want to be. The power America has to hurt its allies, and the incentives to reap that advantage, is the consistent failure mode of liberal foreign-relation structures, since the days of the Delian League.
    [1] https://en.m.wikipedia.org/wiki/Realism_(international_relat...
    [2] https://en.m.wikipedia.org/wiki/Idealism_in_international_re...
    [3] https://en.m.wikipedia.org/wiki/Liberalism_(international_re...
    [4] https://youtu.be/CH1oYhTigyA
    [5] https://en.m.wikipedia.org/wiki/Realpolitik
    [6] https://en.m.wikipedia.org/wiki/Balance_of_power_(internatio...
  - unraveller a day ago
    
    You can't quantify predictable outcomes of business rivalries so well either, but you probably don't want to do away with them for that reason.
taurknaut a day ago

> Click bait headline. Nothing was proven about open source as the future.
Well sure but generally speaking proofs about reality are an oxymoron, so who on earth was taking the headline at face value to begin with? This is a rhetorical technique referred to as "hyperbole".
- JumpCrisscross a day ago
  
  > generally speaking proofs about reality are an oxymoron
  Proofs about reality are not self contradicting. Something not being entirely correct doesn't an oxymoron make.
  - taurknaut a day ago
    
    > Proofs about reality are not self contradicting.
    Absolutely they are! Proofs are a deductive concept with no basis in reality. This is basic Hume. All we can work with is inductive and abductive reasoning, neither of which is sufficient for a proof.
    
    JumpCrisscross a day ago
    
    > Proofs are a deductive concept with no basis in reality. This is basic Hume.
    One, it's not. Two, you're trying to use Hume to prove a statement that refutes itself. The claim that your can prove proofs oxymoronic is itself an oxymoron.
    Hume's critique of causation, moreover, has been amply supplanted since the 18th century. (Similar to Newton. In parts, it's been buttressed. In others, surpassed.)
    > All we can work with is inductive and abductive reasoning, neither of which is sufficient for a proof
    Mathematically false [1]. (And related to famous Gedankenexperiments, which prompted real science.)
    Of course, this whole thread is a farce: you're purposefully confusing mathematial proofs with the colloquial "proof."
    [1] https://en.wikipedia.org/wiki/Mathematical_induction
    
    taurknaut a day ago
    
    > Mathematically false [1].
    Ok, this has no bearing on our empirical reality.
    
    JumpCrisscross 16 hours ago
    
    You’re rejecting mathematics, empiricism and reality as a foundation for defining proofs and thus truth. At that point, we’re in a Cartesian universe of 1 = 1.
tempeler a day ago

I'm not sure it's planned or not. Finally, Chinese are generally proud of 4 great inventions. one of compass, another one of gonpower. Despite this, I still can't understand why they didn't think of starting geographical exploration and colonization. So I don't know what kind of agenda they have. But I do know one thing: except for China and the US, no one cares who the product comes from. If it's cheap or free, they use it, and no one cares. No one apologizes to the US for losing monopolies.
- JumpCrisscross a day ago
  
  > can't understand why they didn't think of starting geographical exploration and colonization
  They did. Just as a land power. Modern China includes conquered territory of the Mongolians, Turkics and Tibeto-Burmans, among others [1].
  (The proximate answer is the Ming-Qing transition [2] overlapped with the Age of Discocery [3].)
  > except for China and the US, no one cares who the product comes from
  This is breathtakingly wrong, as a simple perusal of every single country's trade restrictions would show. (Even if you're talking about the population versus policy, show me a market where no premium is paid for luxury products imported from such and such distant land.)
  [1] https://en.wikipedia.org/wiki/List_of_ethnic_groups_in_China...
  [2] https://en.m.wikipedia.org/wiki/Transition_from_Ming_to_Qing
  [3] https://en.m.wikipedia.org/wiki/Age_of_Discovery
- suraci a day ago
  
  > I still can't understand why they didn't think of starting geographical exploration and colonization
  maybe it's offtopic, but that's what I'm good at, so I'll anwser this
  First, ancient China was a feudal centralized dynasty that centered its interests on land and population, unlike commercial company-based regimes such as Britain and the Netherlands. This meant that, in the eyes of the Chinese imperial government, the East India Company was a threat rather than a cooperative partner.
  Another reason is that ancient China was a typical land-based power, surrounded by various forces. It could only maintain its sphere of influence through annexation and the tributary system, without the ability to expand further. (Genghis Khan was the only exception—he carried out invasions but never truly established effective rule.)
  However, ancient China did, to some extent, "colonize" certain Southeast Asian islands. But this was not institutionalized colonization; rather, it was a form of population migration. The central government had no control over these Chinese people venturing into the seas, which is why it repeatedly tried to prevent maritime expansion.
  btw, in case someone said about xinjiang and tibet, you'll see he don't understand history outside the west, base on what i said, you can see it was annexation but not colonization
- manquer a day ago
  
  Why would they want to colonize anyone?
  Only european powers had the urge for colonization, no other civilization in Americas, Africa or Asia really ever want to colonize, expand perhaps but not really colonize.
  There was no economic need to do so, for most of last three millennium the economic center of the world has been India and China , they didn’t feel the need to go anywhere , the land is fertile with large local population and good weather to grow more than one crop with rich cultural heritage and throughput there is no payoff for undertaking risky voyages.
  Everyone wanted to trade with them, colonial powers bombed ports forcing trading agreements or sold opium and other narcotics to get a foothold, funded expensive expeditions for new trade routes to India and colonized another continent instead , most of era of industrial revolution have been focusing on them as the market for European products not merely resource extraction.
  Similarly given the people resources both regions had, there was no need for slavery that is also a european/Mediterranean thing primairly .
  Not saying workers were or are treated well or there was great value for human rights in India or China, just that they need to go and find slaves from far off to do the work. They could find all the resources domestically.
  - virissimo a day ago
    
    > Only european powers had the urge for colonization, no other civilization in Americas, Africa or Asia really ever want to colonize, expand perhaps but not really colonize.
    * Inca Empire: Relocated entire communities (the mitmaqkuna) into new provinces to cement imperial control—these were explicit colonies with an imposed administrative and cultural framework.
    * Ancient Egypt: Occupied Nubia, built forts, stationed garrisons, and imposed Egyptian officials and religion on the local population.
    * Mongol Empire: Installed governors across conquered regions stretching from Eastern Europe to East Asia, moved artisans and workers to bolster Mongol centers, and demanded tribute—hallmarks of a colonial system.
    * Imperial China: Established commanderies in newly acquired territories (e.g., southern China), encouraged Han settlement, and superimposed its bureaucracy over local governance.
    
    manquer a day ago
    
    Colonialism is not that same as imperialism.
    Historians do not consider mongol or Inca empire colonial . I would say mongols were probably polar opposite of colonizers they were extremely open and integrated extremely well into every region culture they occupied, there was no classical markers of colonization.
    I specifically added Mediterranean later in my parent post to cover Egypt , Phoenician and Arab colonization which are considered as examples of pre modern era colonizing.
    The hard separation of North Africa is sadly a modern view of the region that I have to do that explicitly, for most of history empires always had some land on both sides of the Mediterranean. This view is either promoted and exploited by far right in southern europe to justify many policies.
    
    virissimo 5 hours ago
    
    Your original claim was that "only European powers had the urge for colonization," but now you're citing Arab, Egyptian, and Phoenician examples. Do you see these as exceptions? If so, wouldn't that contradict your original claim? Or are you reconsidering your definition of colonialism—or using "European" in a broader sense (that somehow includes Arabs and Egyptians)?
  - greenleafone7 a day ago
    
    Yes, the Chinese didn't like colonizing. They actually preferred complete extermination. The West has been much too kind in this regard. Also slavery is a European thing? Cute! I think indeed we have been much too kind with foreigners, they somehow managed to thing that the laws, ethics and technology we gave them are just innate things found in nature, when in fact they are just European culture. Just like exploring the entire planet, cataloging its history and animals. We in fact had an extremely small amount of slaves compared to Arabs or Asians and to your lament we ended slavery. Somehow you still found ways to do it to this day though. Additionally the society with the most slaves in history has been Korea. And the time of us accepting millions of immigrants desperate to either live with us or copy us and then tell us how much greater their own societies are, will end soon. You are free to go and live there with your own people.
    
    manquer a day ago
    
    > West has been much too kind in this regard
    Genocides in Americas, Australia and elsewhere of first nation people notwithstanding i suppose
    > laws, ethics and technology we gave them
    Unasked and unwanted "civilizing" by European powers is what got us Congo Free State and dozens of other atrocities all under the name of "civilizing". It is not like rest of the world was living in trees with no laws and morality.
    > Most slaves in history has been Korea
    This is a controversial view of Korea, there is no consensus if nobi and the class system during the Joseon period (much less so in Goreyo period) was serfdom or slavery, that is not easy classification to make, given that they had many rights, many earned salary, nobi women in 1400s got 100 days maternity leave by law, a lot more than modern American women do today.
    Even if we take assume they were all slaves, Korea was by no means the leading country by % of population, and also we have to consider nobi were largely ethnic Koreans, not foreigners explicitly captured to be slaves and the economy didn't run on continuous capture of foreign slaves
    > us accepting millions of immigrants desperate to either live with us or copy us and then tell us how much greater their own societies are, will end soon. You are free to go and live there with your own people.
    While there is a discourse to be had socio-economic policies in the west from repatriation of cultural artifacts, to climate change or geopolitics that can stabilize the global south and reduce immigration, at this point I have to stop engaging.
  - JumpCrisscross a day ago
    
    > the economic center of the world has been India and China , they didn’t feel the need to go anywhere
    You're describing two modern states that encompass geographies that were constantly at internal turmoil. (Including as empires [1].) It's like asking why the Germans were late to the game in colonising: they're a land power and were in a constant state of internal turmoil.
    "They had enough" flies in the face of human history and European colonialism itself.
    [1] https://en.wikipedia.org/wiki/List_of_Hindu_empires_and_dyna...
    
    manquer a day ago
    
    I didn’t mean to say They had enough to mean they were satiated , it was supposed to mean they had enough in their own regions to fight , win and enjoy over they didn’t need to go overseas to acquire riches .
    Neither region is a utopia in history or today, simply there was enough land and people and other resources within, so they viewed their region to be the world, there was no economic impetus to colonize or enslave from far off places is my point .
  - paulddraper a day ago
    
    > no other civilization in Americas, Africa or Asia really ever want to colonize
    Barley warrants a response but
    https://en.wikipedia.org/wiki/Japanese_colonial_empire
    
    manquer a day ago
    
    We are talking ancient history ? Not Japan post Meiji restoration trying to be copy and catch up to world powers after stagnating during the Tokugawa shogunate for centuries.
    After the sengoku jidai[1] the failed imjin wars under Toyotomi Hideyoshi was the only serious attempt to expand to China and Korea, they of course failed and Japan faced inward till Meiji period as was typical of most of their history
    Post Meiji restoration is hardly a fair comparison the Japanese believed that they have to be like other world (colonial) powers to be powerful.
    [1]Unrelated note: one of my favorite periods in history.
    
    paulddraper a day ago
    
    > We are talking ancient history ?
    I'm not. And I didn't think you were either.
    But if we are, the Arab colonization of the Middle East + North Africa has to rank among the most dominant of all time, yes? Still apparent to this day.
    
    manquer a day ago
    
    Yes, Arab Phoenician and Egyptian empires all are classified as early colonizers and/or slavers. I added the qualifier European/Mediterranean hoping to signal i covered them as well, but that doesn't seem to be coming across.
    Let me put another way- sub Saharan African, Chinese, southeast/far-east Asian, Indian, North/South American(first nations), Polynesian empires etc largely did not do empire building via colonization or slavery.
    This is not to say they valued human life or did not commit atrocities, it just means that economic models that necessitated colonies for resource extraction or large markets to sell to, or foreign slaves for human labor never evolved there influenced to environmental, population and cultural factors so colonization is atypical response when the empires are built there.
    We can see observe difference today in how say China deals with foreign investments, loans and other development initiatives compared to how western powers do. The deals tend to be primarily economic with willingness to work with existing regimes and less non-economic conditions attached and so on.
- corimaith a day ago
  
  Have you looked at the map of China in the past? Much of the West and North was only recently conquered in the same time period as Colonialism, and the South prior to that. Xinjiang literally means "New Frontier", and the ongoing tensions can be viewed as the continuation of such colonialism in modern times.
- logicchains a day ago
  
  >Despite this, I still can't understand why they didn't think of starting geographical exploration and colonization
  They literally had emperors who banned all overseas travel because it represented a threat to their own power: https://en.m.wikipedia.org/wiki/Haijin . China is the extremely large and extremely centralised, so the rulers' primary focus has always been on maintaining their own power. Fortunately the current government still allows private firms enough freedom that one was able to invent DeepSeek, however if the recent crackdown on financial firms had happened a few years earlier then the firm behind DeepSeek wouldn't have had the money to fund its creation.
- caycep a day ago
  
  zheng he
checker659 a day ago

Who's going to invest in AI (building foundation models) if some other company can come and dethrone you in a snap?
- karamanolev a day ago
  
  OpenAI invested in non-open foundation models and DeepSeek came and (approximately) dethroned them. I don't agree with the conclusion of the article, but the statement "open-sourcing a model lets another company dethrone you" is also not great. I'd stand behind "in the world of AI in 2024/2025, regardless of whether the model is open or not, someone is likely to come and dethrone you in a snap".
  - gs17 a day ago
    
    > DeepSeek came and (approximately) dethroned them
    The parenthetical should really be "(temporarily, approximately)". I wouldn't count OpenAI out until we see how o3 compares, assuming they actually make it available this week.
    
    karamanolev a day ago
    
    Oh, agree. The throne has a new king every day. Sometimes a past king, sometimes a brand new one. The temporary nature of it was implied.
- goosejuice a day ago
  
  I guess we'll find out once someone gets dethroned. As far as I'm aware that hasn't happened.
- moduspol a day ago
  
  Maybe "dethrone" isn't the right word, but if a startup on the other side of the world, without the best hardware, can create something comparable and cheaper to build/run just four months after the release of the top company's flagship model: I don't understand the OpenAI business model.
  I'm with you. How are they going to make money?
- deadbabe a day ago
  
  There is no throne, it’s a game of musical chairs.
nycdatasci a day ago

They're raising capital.
esnard a day ago

To be fair, the article doesn't even contain the word "prove".
zx10rse a day ago

It was proven and it is the future.
Onavo a day ago

His open source company is copyleft unlike DeepSeek, and the application domain is specifically designed such that no company would use his "open source" product.
sieabahlpark a day ago

[dead]

DrBenCarson a day ago

DEEPSEEK IS NOT OPEN SOURCE, THEY JUST PUBLISHED THE WEIGHTS

dang a day ago

"Please don't use uppercase for emphasis. If you want to emphasize a word or phrase, put asterisks around it and it will get italicized."
https://news.ycombinator.com/newsguidelines.html.
- otterley a day ago
  
  In this case, I think yelling louder is useful. We need to band together to eliminate this false and misleading appellation.
  We have had a term to describe this kind of software for decades: "freeware." That's what this and all other "free to download and use" offerings are; they are not open source under any commonly-understood meaning prior to last year.
fuddle a day ago

To be called open source under the new Open Source AI Definition. They'd need to release the: Data Information, Code and Parameters. https://opensource.org/ai/open-source-ai-definition
maxloh a day ago

I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
https://news.ycombinator.com/item?id=41952722
- visarga a day ago
  
  > You will need to have all parts involved open-sourced to make a model "open", not just the weights
  How do you propose to opensource terabytes of web scrape text? They give you what they can give you - paper, code, model weights. You can reimplement the code, while the weights are open to do what you like with them.
tgtweak a day ago

I think they also published the training methodology as well - that others have reproduced, no? The only thing that I'm not sure is their low level nvidia CTX training code was released under the license - but in order for a third party to corroborate the training and testing they would need to have that code (and likely the training data as well) would they not?
- jfarina a day ago
  
  They outlined the methodology. They didn't publish their code or the training set.
  - cruffle_duffle a day ago
    
    How could they publish the terabytes of training data? A million RAR files?
    Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?
    I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?
    
    tgtweak a day ago
    
    I'd torrent it :D
thayne a day ago

True. But at the same time, it is more open than "Open" AI. Or even LLAMA.
iab a day ago

Oh my gosh THANK YOU - a repository of paper images and weights is not open source

RexFactorem a day ago

[dead]

magwa101 5 hours ago

[dead]

magwa101 8 hours ago

[dead]

pointedAt a day ago

wait, this ain't another whitelabel OpenAI ChatGPT-oOPs cosplay?

drakythe a day ago

Between the ability of DS R1 to be run offline in ollama and OpenAI publicly kvetching that DS might have "stolen" their data (hahahahahahahahahahahahahaha) I'm pretty sure this isn't just some GPT Pass-through like other LLM Cons of the past. (not to mention DSv3 was released in November and no one has claimed it is a pass-through either)